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The gene encoding the membrane (M) protein of the 
OC43 strain of human coronavirus (HCV-OC43) was 
amplified by a reverse transcription—polymerase chain 
reaction of viral RNA with HCV-OC43- and bovine 
coronavirus (BCV)-specific primers. The nucleotide 
sequence of the cloned 1-5 kb fragment revealed an 
open reading frame (ORF) of 690 nucleotides which 
was identified as the M protein gene from its homology 
to BCV. This ORF encodes a protein of 230 amino 
acids with an M, of 26416. The gene is preceded by the 
motif UCCAAAC, analogous to the consensus corona- 
virus transcription initiation sequence. The M protein 
of HCV-OC43 shows features typical of all coronavirus 
M proteins studied: a hydrophilic, presumably external 
N terminus including about 10% of the protein, and a 
potential N-glycosylation site followed by three major 


hydrophobic transmembrane domains. The amino acid 
sequence of the M protein of HCV-OC43 has 94% 
identity with that of the Mebus strain of BCV, and also 
contains six potential O-glycosylation sites in the 
exposed N-terminal domain. Indeed, the glycosylation 
of the M protein was not inhibited in the presence of 
tunicamycin, which is indicative of O-glycosylation, as 
previously reported for BCV and murine hepatitis 
virus. Virions released from tunicamycin-treated cells 
contained the M glycoprotein but were devoid of both 
peplomer (S) and haemagglutinin-esterase (HE) pro- 
teins. Thus, inhibition of the N-glycosylation of the S 
and HE structural proteins prevented their incorpora- 
tion into progeny virions, an indication that they are 
dispensable for virion morphogenesis, unlike the M 
protein. 


Human coronaviruses (HCVs) are grouped into two 
major antigenic clusters, represented by the prototype 
strains 229E and OC43 (Siddell et al., 1983). HCV-OC43 
shares antigenic relationships with other coronaviruses 
such as mouse hepatitis virus (MHV), rat sialodacrya- 
denitis virus, porcine haemagglutinating encephalo- 
myelitis virus, bovine coronavirus (BCV) and rabbit 
coronavirus. HCV-229E is the prototype strain of 
another antigenic group which includes porcine trans- 
missible gastroenteritis virus (TGEV), porcine respir- 
atory coronavirus, canine coronavirus, feline enteric 
coronavirus and feline infectious peritonitis virus 
(FIPV). 

HCVs are recognized as the causative agents of 
respiratory diseases, being responsible for about 15% of 
common colds (McIntosh, 1974). Other disease associ- 
ations have been suggested but are less well documented, 
for example the involvement of HCV in severe diarrhoea 
(Resta et al., 1985) or neurological disease such as 
multiple sclerosis (Burks et al., 1980; Weiss, 1983). 


The nucleotide sequence data reported in this paper will appear in 
the EMBL and GenBank Nucleotide Sequence Databases under the 
accession number M93390. 
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MHVs show diverse tropisms, and some MHV strains in 
rodents have been used as a model system to study 
chronic and acute hepatic and neurological diseases. 

Coronaviruses contain a capped and polyadenylated 
positive-sense ssRNA molecule of 27 to 31 kb (Boursnell 
et al., 1987; Lee et al., 1991). Virus-specific mRNAs in 
infected cells comprise a genomic-sized mRNA plus four 
to eight subgenomic mRNA species. These mRNAs are 
arranged in a 3’-coterminal nested set structure, in which 
the sequence of each mRNA is contained within the 
sequence of the next larger MRNA. The mRNAs appear 
to be formed by a mechanism of leader-primed transcrip- 
tion, and a consensus intergenic sequence is the proposed 
site of fusion of the leader sequence with the mRNA 
coding region (Lai, 1990). 

Previous studies have identified four HCV-OC43 
structural proteins: a 190K peplomer (S) glycoprotein 
(normally present as subunits of 120K and 100K), a 
130K haemagglutinin—esterase (HE) glycoprotein, a 55K 
nucleocapsid (N) phosphoprotein and a 26K membrane 
(M) glycoprotein (Hogue & Brian, 1986). Various studies 
have reported remarkable antigenic and genomic simi- 
larities between HCV-OC43 and BCV (Hogue et al., 
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Fig. 1. Complete nucleotide sequence of the M protein gene of HCV-OC43 and its deduced amino acid sequence. The intergenic 
consensus sequences are underlined. The potential N-glycosylation (small circle) and surface-accessible O-glycosylation (asterisks) sites 


are indicated. 


1984; Lapps & Brian, 1985; Kamahora et al., 1989; 
Zhang et al., 1992). 

As part of our ongoing studies on the molecular 
characterization of HCVs and their possible involve- 
ment in neurological diseases (Arpin & Talbot, 1990; 
Jouvenne et al., 1990, 1992; Talbot & Jouvenne, 1992), 
we now report the nucleotide sequence of the gene 
encoding the M protein of HCV-OC43. Its predicted 
amino acid sequence is compared with sequences 
determined for other coronaviruses and shown to be 
closely similar to that of BCV. Moreover, it is also O- 
glycosylated. 

HCV-0C43 was obtained from the ATCC and 
propagated at 37 °C on the HRT-18 human rectal tumour 
cell line. Cells were grown as described previously 
(Jouvenne et al., 1992) except that 10 units/ml TPCK- 
trypsin (Sigma) was added and infections (m.o.i. 0-2) 
were done at 37 °C. 

Viral mRNA (100 ng) prepared from infected cells 
(Chirgwin et al., 1979) was reverse transcribed using 
antisense primer 5° TCGGCCCACTTGAGGATG 3’, 
complementary to nucleotides 147 to 165 of the HCV- 
OC43 N gene (Kamahora et al., 1989). The cDNAs were 


amplified with a sense primer 5’ CTGGACACCAG- 
GAGTTAG 3’, located in the 3’ region of the S gene of 
BCV [nucleotides 290 to 308 (Abraham et al., 1990)] and 
the antisense primer, using the polymerase chain 
reaction (PCR; Stewart et al., 1992). Two different 
purified 1-5 kb PCR products were cloned into the 
pBluescript II SK(+) vector (Stratagene), and unidirec- 
tional deletions were created using exonuclease III and 
mung bean nuclease (Stratagene). Sequencing was 
performed on both PCR products by the dideoxynucleo- 
tide chain termination method (Sanger & Coulson, 1975) 
using T7 DNA polymerase (Pharmacia) and [?°S]dATP 
(Amersham). No mismatched bases, additions or dele- 
tions were found between the two clones. Sequence 
analyses, including hydropathy plots (Kyte & Doolittle, 
1982), were performed on an Apple Macintosh computer 
with the MacVector 3.5 (International Biotechnologies) 
and GeneWorks 2.0 (IntelliGenetics) sequence analysis 
programs. 

To study the effect of tunicamycin on the glycosylation 
of the viral glycoproteins, a final concentration of 
5 ug/ml tunicamycin (Boehringer Mannheim) was added 
to infected cells and maintained throughout infection. At 
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Fig. 2. Amino acid sequence comparison of the HCV-OC43 M protein with that of other coronavirus strains [BCV (Lapps et al., 1987), 
TCV (Verbeek & Tijssen, 1991), MHV-A59 (Armstrong et al., 1984), MHV-JHM (Pfleiderer et al., 1986), TGEV (Laude et al., 1987), 
FIPV (Vennema et al., 1991), HCV-229E (Jouvenne et al., 1990) and IBV (Boursnell et a/., 1984)] by alignment for maximum identity. 
Dots indicate residues identical to those of HCV-OC43; hyphens represent gaps introduced into the sequence. The analysis was 
performed with the GeneWorks 2.0 program (IntelliGenetics) using default settings. 


4h post-infection, 2:5 mCi (35S]methionine/cysteine 
(Trans*5S-Label, ICN Biomedical) was added, after a 
1h methionine and cysteine deprivation period. After 
2h, unlabelled methionine and cysteine were added, as 
was foetal bovine serum, and infection was allowed to 
proceed for another 39h. Radiolabelled virions pro- 
duced in the presence or absence of tunicamycin were 
purified on Renografin-60 gradients and structural 
proteins were analysed as described previously (Arpin & 
Talbot, 1990). 

The complete nucleotide sequence of the HCV-OC43 
M gene and its predicted amino acid sequence are shown 
in Fig. 1, together with potential glycosylation sites and 
intergenic consensus sequences. The largest open read- 
ing frame (ORF) (nucleotides 33 to 725) encodes a 
protein of 230 amino acids with a predicted M, of 26416, 
consistent with the estimated M, of the M protein of 
HCV-0C43 determined by SDS-PAGE (Hogue & 
Brian, 1986; Schmidt & Kenny, 1982; Fig. 3). 

As shown in Fig. 2, the M protein of HCV-OC43 is 
very similar to the corresponding protein of the Mebus 
strain of BCV (Lapps et al., 1987), which is antigenically 
related. Indeed, extensive identity exists between the 
HCV-0C43 and both BCV and turkey coronavirus 
(TCV) M proteins at the amino acid level (94%). An 
identity of 83 to 84% is found between the M proteins of 


HCV-0C43 and the A59 and JHM strains of MHV, 
which belong to the same antigenic group. On the other 
hand, the M proteins of the antigenically distinct TGEV, 
FIPV, HCV-229E and infectious bronchitis virus (IBV) 
show only 37, 35, 32 and 26% identity to that of HCV- 
OC43, respectively. The M proteins of both HCV-OC43 
and BCV are composed of the same number of residues. 
Moreover, they also possess identical numbers of basic 
and acidic amino acids, and are predicted to have similar 
M,s. The M protein of HCV-OC43 contains three 
cysteine residues (positions 74, 126 and 183), whereas 
BCV lacks the C-terminal one of these. An intergenic 
sequence, UCCAAAC, identical to the one observed in 
front of the BCV M gene (Lapps et al., 1987) and some 
other coronavirus genes [BCV mRNAs 4 and 5 (Abra- 
ham et al., 1990); MHV-A59 M gene (Armstrong et al., 
1984); MHV-JHM M gene (Pfleiderer ef a/., 1986)] is 
present 10 nucleotides upstream of the predicted 
initiation codon of the HCV-OC43 M protein (Fig. 1). 

Like those of BCV, TCV and MHV, there is one 
potential N-glycosylation site in the predicted HCV- 
OC43 M protein sequence (Asn 26); it is located near the 
N-terminal, presumably exposed portion of the mol- 
ecule. Two such exposed sites are found in IBV, TGEV 
and FIPV, whereas one of three sites in HCV-229E is 
predicted to be external. However, since the M protein 
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Fig. 3. Effect of tunicamycin on HCV-OC43 proteins. Radiolabelled virions released from HRT-18 cells treated (lanes 2 and 4) or not 
treated (lanes | and 3) with tunicamycin were purified using one (lanes | and 2) or two (lanes 3 and 4) Renografin-60 density gradients, 
and their proteins were identified by SDS-PAGE in the presence (0) or absence (a) of 2-mercaptoethanol, followed by fluorography (10 
day exposure). M, standards were run on the same gels (lanes M) and their sizes are indicated to the left. Viral proteins are identified 
either by their apparent M, or their accepted designation in the case of the major structural proteins: S (180K), S1 and $2 subunits (105K. 
and 80K), HE (125K in unreduced form and 65K in reduced form), N (55K) and M (26K, 27K and 28K). 


has previously been shown to be O-glycosylated in both 
BCV (Deregt et al., 1987) and MHV (Holmes et al., 1981; 
Niemann & Klenk, 1981), we used the N-glycosylation 
inhibitor tunicamycin to verify the type of glycosylation 
found on HCV-OC43 glycoproteins. As shown in Fig. 3, 
virions produced in the presence of this drug were 
completely devoid of both S (180K, 105K and 80K 
forms) and HE (125K and 65K forms) proteins, but still 
contained the three forms (28K, 27K and 26K) of the 
M protein that have previously been shown to be glyco- 
sylated (Hogue & Brian, 1986). Therefore, it appears 
that the M protein of HCV-OC43 1s also O-glycosylated. 
Indeed, six potential O-glycosylation sites are found in 
the external portion of the molecule (Fig. 1). These sites 
are conserved in both BCV and MHV (Fig. 2). 

The extensive identity of the sequence of the M 
proteins of HCV-OC43 and BCV confirms previous 
reports of a close relationship between these two viruses 
revealed by serological analysis (Pedersen et al., 1978; 
Gerna et al., 1981), immunoprecipitation of virion 
structural proteins (Hogue et al., 1984), oligonucleotide 
fingerprinting of genomic RNA (Lapps & Brian, 1985) 
and phylogenetic analysis of the HE gene (Zhang et al., 


1992). Moreover, alignment of the sequences of the N 
proteins of HCV-OC43 and BCV has revealed 97-:5% 
identity (Kamahora et al., 1989). We now find 94% 
identity between the M proteins of HCV-OC43 and 
BCV. Together, these findings suggest that these two 
viruses have diverged from each other only recently. 
Interestingly, these viruses share the same target cell 
specificity in vitro, but apparently show different 
tropisms in vivo. Furthermore, HCV-OC43 causes mainly 
respiratory illness in man, whereas BCV affects mainly 
the gastrointestinal system of cattle. The causes of these 
differences are not known. However, differences 
between the two viruses have been detected at the level of 
some non-structural protein genes (S. Mounir & P. J. 
Talbot, unpublished results). Sequence variations of the 
S glycoprotein could also be involved, as suggested 
previously for the murine and porcine coronaviruses 
(Parker et al., 1989; Rasschaert et al., 1990). The 
possibility remains that an HCV-OC43-like coronavirus 
could be involved in enteric infections (Gerna et al., 
1985), although serologically unrelated HCVs have also 
been reported (Resta et al., 1985). 

The intergenic region UCCAAAC upstream of the M 


gene of HCV-OC43 has also been identified for the M 
genes of BCV (Lapps et al., 1987) and two strains of 
MHV (Armstrong et al., 1984; Pfleiderer et al., 1986). 
The only other reported occurrence of this exact 
sequence is in mRNAs 4 and 5 of BCV (Abraham et al., 
1990). This conserved sequence shows only one nucleo- 
tide difference to the postulated UCUAAAC consensus 
leader RNA binding site found in 23 of 33 published 
animal coronavirus gene sequences (data not shown). 

Comparison of the hydropathy profiles of all known 
coronavirus M proteins (data not shown) shows that the 
expected membrane topology (three hydrophobic do- 
mains) is likely to resemble the proposed model (Rottier 
et al., 1986; Armstrong et al., 1984). Most of the basic 
amino acids are present in the C-terminal half of the 
protein, and therefore might interact with the negatively 
charged RNA and the acidic residues of the N protein 
(Sturman ef al., 1980). Interestingly, only TGEV and 
FIPV possess an N-terminal hydrophobic sequence. 

The glycosylation of the HCV-OC43 M protein was 
not sensitive to the inhibitory effect of tunicamycin, 
which is indicative of O-glycosylation, as has been 
reported previously for MHV (Holmes et ail., 1981; 
Niemann & Klenk, 1981) and BCV (Deregt et al., 1987). 
On the other hand, both the S and the HE proteins of 
HCV-0C43 were sensitive to this drug and therefore are 
likely to be N-glycosylated. Interestingly, the non- 
glycosylated precursors of these proteins could not be 
detected on purified virions, which is consistent with the 
absence of non-glycosylated S protein from MHV virions 
(Holmes ef a/., 1981). Our results suggest that the HE 
protein is also dispensable for the formation of the viral 
envelope and virus maturation and release, unlike the M 
protein. 

As shown in Fig. 3, other apparently structural viral 
proteins were observed in addition to the three envelope 
glycoproteins and the N protein. Of the large proteins, 
those with apparent M,s of 240K, 155K and 38K have 
also been observed in BCV and associated with S, N and 
M proteins, respectively {Hogue & Brian, 1986). We also 
found small proteins (22-5K and 17K) and a 45K 
molecule, which appears to be the reduced form of a 
larger protein, possibly p240, which disappears upon 
reduction. 

Six potential O-glycosylation sites are observed within 
the first 28 N-terminal residues of the HCV-OC43 M 
protein. Moreover, there is one potential site for N- 
glycosylation in this region. This site is conserved at the 
same relative position in all known sequences of 
coronavirus M proteins except those of IBV and HCV- 
229E (Fig. 2), although it is apparently not utilized in 
HCV-0C43, BCV and MHV. 

The relative conservation of the M proteins of 
coronaviruses suggests that structural constraints on this 
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protein are rigid, resulting in more limited evolution of 
this protein. The study of the remainder of the genome of 
HCV-OC43 should yield important information on the 
replication, tropism and pathogenesis of this important 
human pathogen. Such studies are in progress. 
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